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D ■ Abstract 

^SJ ■ We study the power of Arthur-Merlin probabilistic proof systems in the data stream 

model. We show a canonical AM streaming algorithm for a wide class of data stream 
' problems. The algorithm offers a tradeoff between the length of the proof and the space 

■ complexity that is needed to verify it. 

As an application, we give an AM streaming algorithm for the Distinct Elements 
O ' problem. Given a data stream of length m over alphabet of size n, the algorithm uses 

0{s) space and a proof of size 0(w), for every s, w such that s-w > n (where O hides a 
polylog(m, n) factor). We also prove a lower bound, showing that every MA streaming 
algorithm for the Distinct Elements problem that uses s bits of space and a proof of 
size w, satisfies s ■ w = Q(n). 

As a part of the proof of the lower bound for the Distinct Elements problem, we 
show a new lower bound of fi (y/n) on the MA communication complexity of the Gap 
Hamming Distance problem, and prove its tightness. 
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d '■ 1 Introduction 

The data stream computational model is an abstraction commonly used for algorithms that 
process network traffic using sublinear space |AMS96l IIW031 ICCM09| . In the settings of this 



model, we have an algorithm that gets a sequence of elements (typically, each element is an 
integer) as input. This sequence of elements is called a data stream and is usually denoted 
by a = (a±, . . . ,a m ); where a\ is the first element, a,2 is the second element, and so forth. 
The algorithm receives its input (a data stream) element-by-element. After it sees each Oj, 
it no longer has an access to elements with index that is smaller than i. The algorithm is 
required to compute a function of the data stream, using as little space as possible. 



*Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 
76100, Israel. E-mail: {torn. gur, ran. raz}@weizmann. ac . il. Research supported by an Israel Science 
Foundation grant and by the I-CORE Program of the Planning and Budgeting Committee and the Israel 
Science Foundation. 



1 



Among the most fundamental problems in the data stream model is the problem of 
Distinct Elements, i.e., the problem of computing the number of distinct elements in a given 
data stream. The problem has been studied extensively in the last two decades (see, for 
example, |AMS96t IIW03j IKNW10] ). Its significance stems both from the vast variety of 
applications that it spans (covering IP routing, database operations and text compression, 
cf . |Mut05t IAMS96t IGKG05] ) , and due to the theoretical insight that it gives on the nature 
of computation in the data stream model. 

Alon at el. [AMS96J have shown a lower bound of Q(n) (where n is the size of the alphabet 



from which the elements are taken) on the streaming complexity of the computation of the 
exact number of distinct elements in a sufficiently long data stream (i.e., where the length of 
the data stream is at least proportional to n). The goal of reducing the space complexity of 
the Distinct Elements problem has led to a long line of research of approximation algorithms 
for the problem, starting with the seminal paper |FM83j by Flajolet and Martin. Recently, 
Kane at el. |KNW10] gave the first optimal approximation algorithm for estimating the 



number of distinct elements in a data stream; for a data stream with alphabet of size n, given 
e > their algorithm computes a (1 ± e) multiplicative approximation using 0(e~ 2 + logn) 
bits of space, with 2/3 success probability. 

A natural approach for reducing the space complexity of streaming algorithms, without 
settling on an approximation, is by considering a probabilistic proof system. Chakrabarti at 
el. |CCM 09| have shown data stream with annotations algorithms for several data stream 
problems, using a probabilistic proof system that is very similar to M.A. This line of work 
continued in |CMT10j . wherein a probabilistic proof system was used in order to reduce 
the streaming complexity of numerous graph problems. In a subsequent work |CMTllj . 
Chakrabarti at el. provided a practical instantiation of one of the most efficient general- 
purpose construction of an interactive proof for arbitrary computations, due to Goldwasser 



at el. GKR08 



In this work, we study the power of Arthur-Merlin probabilistic proof systems in the 
data stream model. We show a canonical AM. streaming algorithm for a wide class of data 
stream problems. The algorithm offers a tradeoff between the length of the proof and the 
space complexity that is needed to verify it. We show that the problem of Distinct Elements 
falls within the class of problems that our canonical algorithm can handle. Thus, we give an 
AM streaming algorithm for the Distinct Elements problem. Given a data stream of length 
m over alphabet of size n, the algorithm uses O(s) space and a proof of size 0(w), for every 
s, w such that s ■ w > n (where O hides a polylog(m, n) factor). 

In addition, we give a lower bound on the MA streaming complexity of the Distinct 
Elements problem. Our lower bound for Distinct Elements relies on a new lower bound that 
we prove on the MA communication complexity of the Gap Hamming Distance problem. 



1.1 Arthur-Merlin Probabilistic Proof Systems 

An AiA (Merlin- Arthur) proof is a probabilistic extension of the notion of proof in complex- 
ity theory. Proofs of this type are commonly described as an interaction between two players, 
usually referred to as Merlin and Arthur. We think of Merlin as an omniscient prover, and 
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of Arthur as a computationally bounded verifier. Merlin is supposed to send Arthur a valid 
proof for the correctness of a certain statement. After seeing both the input and Merlin's 
proof, with high probability Arthur can verify a valid proof for a correct statement, and 
reject every possible alleged proof for a wrong statement. 

Formally, the complexity class MA(T, W) is defined as follows: 

Definition 1.1. Let e > 0, and let T, W : N — > N be monotone functions. A language L is 
in MA e (T,W) if there exists a randomized algorithm V (the verifier) that receives an input 
x (denote its size by \x\) and a proof (sometimes called witness) w, such that, 

1. Completeness : For every x G L, there exists a string w of length at most W(\x\) that 
satisfies 

Pi[V{x,w) = 1] > 1 -e. 

2. Soundness: For every x L, and for any string w of length at most W(\x\), 

Pv[V(x,w) = 1] < e. 

3. For every x,w the running time ofV on (x,w) is at most T(\x\). 

Under these notations, we refer to T as the time complexity of the verifier. The function 
W is referred to as the length of the proof, and the sum T + W is called the MA complexity 
of the algorithm. 

An AM. proof is defined almost the same as an MA proof, except that in AM. proof 
systems we assume that both the prover and the verifier have access to a common source of 
randomness (alternatively, AM proof systems can be described as MA proof systems that 
start with an extra round, wherein Arthur sends Merlin a random string). 

The notion of AM and MA proof systems can be extended to many computational 
models. In this work we consider both the communication complexity analogue of MA, 
wherein Alice and Bob receive a proof that they use in order to save communication, and 
the data stream analogues of MA and AM, wherein the data stream algorithm receives a 
proof and uses it in order to reduce the required resources for solving a data stream problem. 

Recently, probabilistic proof systems for streaming algorithms have been used to provide 
an abstraction of the notion of delegation of computation to a cloud (see |CMT10[ ICMTll[ 
ICKLRlfp . In the context of cloud computing, a common scenario is one where a user 
receives or generates a massive amount of data, which he cannot afford to store locally. The 
user can stream the data he receives to the cloud, keeping only a short certificate of the 
data he streamed. Later, when the user wants to calculate a function of that data, the 
cloud can perform the calculations and send the result to the user. However, the user cannot 
automatically trust the cloud (as an error could occur during the computation, or the service 
provider might not be honest). Thus the user would like to use the short certificate that he 
saved in order to verify the answer that he gets from the cloud. 
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1.2 Communication Complexity and the Gap Hamming Distance Problem 

Communication complexity is a central model in computational complexity. In its basic 
setup, we have two computationally unbounded players, Alice and Bob, holding (respectively) 
binary strings x, y of length n each. The players need to compute a function of both of the 
inputs, using the least amount of communication between them. 

In this work we examine the well known communication complexity problem of Gap 
Hamming Distance (GHD), wherein each of the two parties gets an n bit binary string, and 
together the parties need to tell whether the Hamming distance of the strings is larger than 
§ + y/n or smaller than | — ^fn (assuming that one of the possibilities occurs). In |CR11 
a tight linear lower bound was proven on the communication complexity of a randomized 
communication complexity protocol for GHD. Following |CR11] , a couple of other proofs 
( |Vidll4 IShell] ) were given for the aforementioned lower bound. Relying on [Shell] , in this 
work we give a tight lower bound of Q(y/n) on the AAA communication complexity of GHD. 

1.3 Our Results 

The main contributions in this work are: 

1. A canonical AAA streaming algorithm for a wide class of data stream problems, in- 
cluding the Distinct Elements problem. 

2. A lower bound on the AAA streaming complexity of the Distinct Elements problem. 

3. A tight lower bound on the AAA communication complexity of the Gap Hamming 
Distance problem. 

In order to state the results precisely, we first introduce the following notations: given 
a data stream o = (a 1; . . . , a m ) (over alphabet [n]), the element indicator x% '■ [ n ] {0, 1} 
of the i'ih element (i G [m]) of the stream a, is the function that indicates whether a given 
element is in position i G [m] of a, i.e., Xi(j) = 1 if and only if a» = j. Furthermore, let 
X '■ [n] — > {0, l} m be the element indicator of a, defined by 

x(j) = {Xl(j),---,Xm(j))- 

In addition, given n G N we define a clause over n variables function C : 

{0, l} n — > {0, 1} of the form (yi Vy2 V . . . V y n ), where for every i G [n] the literal yi is either 
a variable (xj), a negation of a variable (-<Xj), or one of the constants {0, 1}. 

Equipped with the notations above, we formally state our results. Let < e < 1/2. Let 
V be a data stream problem such that for every m, n G N there exists a set of k = k(m, n) 
clauses {Ct}t&[k] over m variables, and a function ip : {0, l} h — > Z, such that for every data 
stream o = (ax, ... , a m ) with alphabet [n], 

n 

(l-e)V(a)<Y,^(Cxox(3),...,C k o X (j)) < (l + e)P(a). 

3=1 
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Moreover, we assume that ip and {C t }te[k] are known to the verified, and that there exists 
B < poly(m, n) such that i[)(x) < B for every x G {0, l} fc . Given such V, for every < 5 < 1 
and every s, w G N such that s ■ w > n, we give an AM streaming algorithm, with error 
probability S, for approximating V(a) within a multiplicative factor of 1 ± e. The algorithm 
uses space 0{sk ■ polylog(m, n, 5~ r )) , a proof of size W = 0{wk ■ polylog(m, n, <5 -1 )) , and 
randomness complexity polylog(m, n, S^ 1 ). 

We show that the aforementioned algorithm, when applied to the Distinct Elements 
problem with parameters s, w such that s ■ w > n, yields an AM streaming algorithm for 
the problem. The algorithm computes, with probability at least 2/3, the exact number of 
distinct elements in a data stream of length m over alphabet [n], using space O(s) and a proof 
of size 0(w) (where O hides a polylog(m, n) factor). For example, by fixing w = n, we get an 
AM streaming algorithm for the Distinct Elements problem that uses only polylogarithmic 
space. 

We note that an interesting special case of the class of problems that our canonical 
AM streaming algorithm handles can also be stated in terms of Boolean circuits, instead of 
clauses. That is, given < e < 1/2 and a data stream problem V such that for every m, n G N 
there exists an unbounded fan-in Boolean circuit C : {0, l} m — > {0, 1} with k = k(m, n) non- 
input gates, such that for every data stream a = (ai, . . . , a m ) with alphabet [n], 

n 

(1 - e)V(a) < C(xiU), ■ ■ ■ , XmU)) < (! + 

Assuming that C is known to the verifier, we get an AM streaming algorithm for V with 
the same parameters as in the original formulation of the canonical AM algorithm above. 

Our next result is a lower bound on the MA streaming complexity of the Distinct Ele- 
ments problem. We show that every MA streaming algorithm that approximates, within a 
multiplicative factor of 1 ± VVn, the number of distinct elements in a data stream of length 
m over alphabet [n], using s bits of space and a proof of size w, must satisfy s ■ w — Q(n). 

Last, we show a tight (up to a logarithmic factor) lower bound on the MA communi- 
cation complexity of the Gap Hamming Distance problem. For every MA communication 
complexity protocol for GHD that communicates t bits and uses a proof of size w, we have 
t ■ w — Q(n). We prove the tightness of the lower bound by giving, for every i.wGN such 
that t ■ w > n, an MA communication complexity protocol for GHD, which communicates 
O(tlogn) bits and uses a proof of size 0(w\ogn). 

1.4 Techniques 

The main intuition behind our canonical AM streaming algorithm is based on the "alge- 
brization" inspired communication complexity protocol of Aaronson and Wigderson [AW09j. 
However our proof is much more technically involved. 

^or example, ip and {C t }te[k] can be polylog(m, n)-space uniform; that is, the description of ip and 
{C*}te[fc] can be computed by a deterministic Turing machine that runs in polylog(m, n) space. 
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In general, say we have a data stream problem V and two integers s, w such that s-w > n. 
If there exists a low degree polynomial g{x,y) : Z 2 — > Z (that depends on the input stream 
a) and two domains V W ,V S C Z of cardinality w, s (respectively) such that 

then assuming we can efficiently evaluate g at a random point, by a straightforward adapta- 
tion of the |AW09] protocol to the settings of streaming algorithms, we obtain a simple AAA 
streaming algorithm for V. 

However, in our case we can only express V(a) as 

where k is a natural number, {C t }t£[k] are clauses over m variables, ip : {0,1 } k — > Z is a 
function over the hypercube, \ '■ D w x T> s — > {0, l} m is the bivariate equivalent of the element 
indicator \ '■ [ n ] ~ > {0; l} m ; an d T> W ,T> S C Z are domains of cardinality w, s (respectively). 

The function ijj{pi x{ x j y)i ■ ■ ■ ; Cfc xl^? 2/)) i s n °t a l° w degree polynomial. We would 
have liked to overcome this difficulty by using the approximation method of |Raz87t rSmo87| . 
The latter allows us to have a low degree approximation of the clauses {Ct}t&[k], such that 
with high probability (over the construction of the approximation polynomials) we can re- 
place the clauses with low degree polynomials, without changing the output. The aforemen- 
tioned randomized procedure comes at a cost of turning the M.A streaming algorithm to an 
AM. streaming algorithm. 

Yet, the above does not sufficiently reduces the degree of ip (C\ o x{x, y) , . . . , °x{x, yy\ . 
This is due to the fact that the method of |Raz87t ISmo87| results with approximation poly- 
nomials over a finite field of cardinality that is larger than V(a). The degree of the approx- 
imation polynomials is close to the cardinality of the finite field, which in our case can be a 
large number (poly(m, n)). 

Instead we aim to apply the method of |Raz87t ISmo87] to approximate 



{V{a) (modg)} f/eQ 

for a set Q of polylog(m, n) primes, each of size at most polylog(m, n). This way, each 
approximation polynomial that we get is over a finite field of cardinality polylog(m, n), and 
of sufficiently low degree. Then, we use the Chinese Remainder Theorem to extract the 
value of V{a) from {V(cr) (mod q)} qG n- 

Nonetheless, this is still not enough, as for every q G Q we want the answer to be the 
summation of the polynomial approximation of ip (Ci 0^(1, y), . . . , °x(x, y)) (mod q) over 
some domain T> w x T> s C Z 2 (where \T> W \ = w and \D S \ = s). Since the cardinality of the 
field ¥ q is typically smaller than w and s, we use an extension (of sufficient cardinality) of 
the field ¥ q . 

At each step of the construction, we make sure that we perserve both the restrictions 
that are imposed by the data stream model, and the conditions that are needed to ensure 
an efficient verification of the proof. 
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The idea behind our AM. streaming algorithm for Distinct Elements is simply noting 
that we can indicate whether an element j appears in the data stream, by the disjunction of 
the element indicators of j G [n] in all of the positions of the stream (i.e., XiC/)> • ■ • > Xm(j))- 
Then we can represent the number of distinct elements as a sum of disjunctions, and use the 
canonical AM. streaming algorithm in order to solve the Distinct Elements problem. 

As for the lower bound on the MA streaming complexity of the Distinct Elements prob- 
lem, we start by establishing a lower bound on the MA communication complexity of the 
Gap Hamming Distance problem (GHD). A key element in the proof of the latter is based 
on Sherstov's recent result [Shell] on the Gap Orthogonality problem (ORT) and its relation 
to GHD. Sherstov observed that the problem of Gap Orthogonality readily reduces to Gap 
Hamming Distance problem. Although at first glance it seems that the transition to ORT 
is of little substance, it turns out that Yao's corruption bound |Yao83| suits it perfectly. In 



fact, the corruption property for ORT is equivalent to the anti-concentration property of 
orthogonal vectors in the Boolean cube. Using this observation, we prove a lower bound 
on the MA communication complexity of ORT (following the method of |RS04| ). which in 
turn, by the reduction from ORT to GHD, implies a lower bound on the MA communica- 
tion complexity of GHD. Next we adapt the reduction that was implicitly stated in |IW03 



and reduce the MA communication complexity problem of GHD to the MA problem of 
calculating the exact number of Distinct Elements. 



1.5 Related Work 

The data stream model has gained a great deal of attention after the publication of the sem- 
inal paper by Alon, Matias and Szegedy |AMS96| . In the scope of that work, the authors 



have shown a lower bound of Q(n) (where n is the size of the alphabet) on the stream- 
ing complexity of Distinct Elements (i.e., the computation of the exact number of distinct 
elements in a data stream) where the length of the input is at least proportional to n. 

Following |AMS96j there was a long line of theoretical research on the approximation of 
the Distinct Element problem ( [BYJK+021 IIW031 lBITR+071 IBC091 IKN WlOj . see |Mut05] for 



a survey of earlier results). Finally, Kane at el. [KNW10] gave the first optimal approxima- 
tion algorithm for estimating the number of distinct elements in a data stream; for a data 
stream with alphabet of size n, given e > their algorithm computes a (1 ±e) multiplicative 
approximation using 0(e~ 2 + logn) bits of space, with 2/3 success probability. This result 
matches the tight lower bound of Indyk and Woodruff |IW03) . 

In a recent sequence of works, the data stream model was extended to support several 
interactive and non- interactive proof systems |CCM09l ICMTIO} ICKLR11] . The model of 
streaming algorithms with non-interactive proofs was first introduced in |CCM09j and ex- 
tended in |CMT10[ICMTH"| . In |CCM09| the authors gave an optimal (up to polylogarithmic 



factors) data stream with annotations algorithm for computing the /c'th frequency moment 
exactly, for every integer k > 1. 
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2 Preliminaries 



2.1 Communication Complexity 

Let X, Y, Z be finite sets, and let / : X x Y — > Z be a (possibly partial) function. In 
the two-party probabilistic communication complexity model we have two computationally 
unbounded players, traditionally referred to as Alice and Bob. Both players share a random 
string. Alice gets as an input i6l. Bob gets as an input y £ Y. At the beginning, none of 
the players has any information regarding the input of the other player. Their common goal 
is to compute the value of f(x,y), using a protocol that communicates as small number of 
bits as possible. In each step of the protocol, one of the players sends one bit to the other 
player. This bit may depend on the player's input, the common random string, as well as on 
all previous bits communicated between the two players. At the end of the protocol, both 
players have to know the value of f(x,y) with high probability. 

2.1.1 MA Communication Complexity 

In AAA communication complexity protocols, we have a (possibly partial) function / : X x 
Y —> {0, 1} (for some finite sets X, Y), and three computationally unbounded parties: Merlin, 
Alice, and Bob. The function / is known to all parties. Alice gets as an input iGl. Bob 
gets as an input y £ Y. Merlin sees both x and y. We think of Merlin as a prover, and think 
of Alice and Bob as verifiers. We assume that Alice and Bob share a private random string 
that Merlin cannot see. 

At the beginning of an M.A communication complexity protocol, Merlin sends a proof 
string w to both Alice and Bob, so both players have a free access to w. The players proceed 
as before. In each step of the protocol, one of the players sends one bit to the other player. At 
the end of the protocol, both players have to know an answer z. Hence, the answer depends 
on the input (x,y) as well as on the proof w. For a protocol P, denote by P((x,y),w) the 
probabilistic answer z given by the protocol on input (x, y) and proof w. 

An AAA communication complexity protocol has three parameters: a limit on the proba- 
bility of error of the protocol, denoted by e; a limit on the number of bits of communication 
between Alice and Bob, denoted by T; and a limit on the length of Merlin's proof string, 
denoted by W. 

With the above in mind, we can now define Ai A e (T, W) communication complexity as 
follows: 

Definition 2.1. An AiA e (T, W)- communication complexity protocol for f is a probabilistic 
communication complexity protocol P, as above (i.e., with an additional proof string w pre- 
sented to the players). During the protocol, Alice and Bob communicate at most T bits. The 
protocol satisfies, 

1. Completeness: for all (x,y) £ there exists a string w such that \w\ < W , 
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that satisfies 



Pr [P((x,y),w) = l] > 1 - e. 



2. Soundness: for all (x,y) G / 1 (0) and for any string w such that \w\ < W , we have 



2.1.2 The Gap Hamming Distance Problem 

Let n G N, and let (o, (± > 0. We define the Gap Hamming Distance problem as follows: 

Definition 2.2. The Gap Hamming Distance problem is the communication complexity 
problem of computing the partial Boolean function GHD n ^ ^ : { — 1, l} n x { — 1, l} n — > {0, 1} 
given by 



Denote GHD = GHD n iV ^ iV ^- 
2.2 Streaming Complexity 

Let e > 0, 5 > 0. Let m,n G N. A data stream a = (a\, . . . , a m ) is a sequence of elements, 
each from [n] = {1, . . . , n}. We say that the length of the stream is m, and the alphabet size 
is n. 

A streaming algorithm is a space-bounded probabilistic algorithm that gets an element- 
by-element access to a data stream. After each element arrives, the algorithm can no longer 
access the elements that precede it. At the end of its run, the streaming algorithm is required 
to output (with high probability) a certain function of the data stream that it read. When 
dealing with streaming algorithms, the main resource we are concerned with is the size of 
the space that the algorithm uses. 

Formally, a data stream problem V is a collection of functions {f m ,n '■ [n] m — > ^}m,neN- 
That is, a function for every combination of length and alphabet size of a data stream. 
However, slightly abusing notation for the sake of brevity, we will define each data stream 
problem by a single function (which in fact depends on the length m and alphabet size n of the 
data stream). A 5-error, e-approximation data stream algorithm A Cj s for V is a probabilistic 
algorithm that gets a sequential, one pass access to a data stream o = (ai, . . . , a m ) (where 
each ai is a member of [n]), and satisfies: 



Pr [P((x,y),w) = l] <e. 




(x, y) > Ci 
(x, y) < -Co 



Pr 




-1 >e <8. 



If e = we say that the streaming algorithm is exact. 
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Last, given a data stream problem V = {f m ,n '■ [n] m ~ > ^}m,neN and a data stream 
a = (di, . . . ,a m ) (with alphabet [n]) we denote by V{p) the output of / m , n (cr), for the 
/m,n ^ that matches the length and alphabet size of a. Similarly, when applying a 
family of functions to a, we in fact apply a specific function in the family, according to the 
parameters m, n of a. 

2.2.1 The Distinct Elements Problem 

The Distinct Elements problem is the problem of computing the exact number of distinct 
elements that appear in a data stream, denoted by F (cr). Formally, we define: 

Definition 2.3. The Distinct Elements problem is the data stream problem of computing 
the exact number of distinct elements in a data stream a = (a±, . . . , a m ) (where en G [n] for 
every i), i.e., computing (exactly): 

F (a) = | {i G N : 3j G [m] a j =i}\. 

Note that if we define 0° = then this is exactly the O'th frequency moment of the 
stream. Hence the notation F Q . 

3 Streaming Algorithms with Probabilistic Proof Systems 

In this section we extend the data stream computational model in order to support two 
types of probabilistic proof systems: M.A algorithms, wherein the streaming algorithm gets 
a proof that it probabilistically verifies, and AAA algorithms that extend MA algorithms 
by adding shared randomness. We study both of these probabilistic proof systems in two 
variations: in the first, the proof is also being streamed to the verifier, and in the second, 
the verifier has a free access to the proof. Formal definitions follow. 

3.1 MA Streaming Algorithms 

Similarly to the way AAA communication complexity protocols are defined, in AAA streaming 
algorithms we have an omniscient prover (Merlin) who sends a proof to a verifier (Arthur), 
which is in fact a streaming algorithm that gets both the input stream and the proof (either 
by a free access or by a one-pass, sequential access). The streaming algorithm computes a 
function of the input stream. Using the proof we hope to achieve a better space complexity 
than what the regular streaming model allows. 

We start with AAA proofs wherein the proof is being streamed to the verifier. Formally, 
we define 

Definition 3.1. Let e > 0, 5 > 0, and let V = {f m ,n '■ [ n ] m — > K} m ^ e ^ be a data stream 
problem. AnAiA streaming algorithm for V is a probabilistic data stream algorithm A, which 
simultaneously gets two streams: an input stream a = (ai, . . . , a m ) (where di G [n] for every 
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i ) and a proof stream u; to both it has a sequential, one pass access. Given two functions 
S, W : N 2 — > N, we say that an AAA streaming algorithm is AAA t) s(S(m,n), W(m,ri)) if it 
uses at most S(m, n) bits of space, and satisfies: 

1. Completeness: for every a = (a±, . . . ,a m ) (with alphabet [n]) there exists a non 
empty set W a of proof streams of length at most W(m, n), such that for every u G W a 
we have, 



Pr 



A(a, oj) 



1 



< e 



> 1 -5 



2. Soundness: for every a = (ai, 
have 



fm,n( a ) 

. . , a m ) (with alphabet [n]), and for every u G" W a we 
Pr[A(a,u) ^_L] < 5 

where _L ^ K is a symbol that represents that the algorithm could not verify the cor- 
rectness of the proof. 

The second natural way to define an MA probabilistic proof system for the data stream 
model, is by allowing the algorithm a free access to the proof. This leads to the following 
definition: 

Definition 3.2. Let e > 0, 5 > 0, and let V = {/ m>n : [n] m — > IR} ming N be a data 
stream problem. An AAA streaming algorithm for V is a probabilistic data stream algo- 
rithm A w , which has a free oracle access to a proof string w. The algorithm gets a stream 
a = (ai, . . . ,a m ) (where aj G [n] for every i) as an input, to which it has a sequential, one 
passjiccess. Given two functions S, W : N 2 — > N, we say that an AAA streaming algorithm 
is A4A e ,5(S(m,n),W(m,n)^ if it uses at most S(m,n) bits of space, and satisfies: 

1. Completeness: for every a = (a±, . . . ,a m ) (with alphabet [n]), there exists a non 
empty set W CT of proof strings of length at most W(m, n), such that for every w G W a 
we have, 



Pr 



A w (a) 

fm,n( a ) 



< e 



>l-6 



2. Soundness: for every a = (ai, . . . , a m ) (with alphabet [n]), and for every w W a we 
have 

Pv[A w (a) ^ _L] < 5 

where _L ^ K is a symbol that represents that the algorithm could not verify the cor- 
rectness of the proof. 

Note that by definition, the model of AAA streaming with a free access to the proof is 
stronger than the model of AAA streaming with a proof stream. Thus when in Section Owe 
prove lower bounds on the AAA streaming complexity, it also implies lower bounds on the 
AAA streaming complexity. 
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3.2 AM Streaming Algorithms 

We can further extend the data stream model to support an AAA probabilistic proof system. 
Similarly to the case of AAA proofs, an AAA streaming algorithm receives a proof stream 
and an input stream, to which it has a sequential, one pass access; except that in AAA proof 
systems the prover and verifier also share a common random string. Formally, we define 

Definition 3.3. Let e > 0, 5 > 0, and let V = {f m , n : [n] m — > M} m>ne N be & data stream 
problem. An AAA streaming algorithm for V is a probabilistic data stream algorithm A r 
that has an oracle access to a common random string r, and that is also allowed to make 
private random coin tosses. The algorithm simultaneously gets two streams: an input stream 
a = (di, . . . , a m ) (where Oj G [n] for every i) and a proof stream u, to both it has a sequential, 
one pass access. Given two functions S, W : N 2 — > N, we say that an AAA streaming 
algorithm is AAA e j(S(m,n),W{m,n)) if it uses at most S(m,n) bits of space, and satisfies 
that for every a = (ai, . . . , a m ) (over alphabet [n]), with probability at least 1 — 5/2 (over r ) 
there exists a non empty set W CT (r) of proof streams of length at most W(m,n), such that: 

1. Completeness: For every u G W a (r) 

Pr 

where the probability is taken over the private random coin tosses of A r . 

2. Soundness: For 10 £ W a (r) 

Pr[A r (a,u) = l.]>l-^ 

where the probability is taken over the private random coin tosses of A r , and _L ^ R 
is a symbol that represents that the algorithm could not verify the correctness of the 
proof. 

The randomness complexity of the algorithm is the total size of the common random string 
r, and the number of private random coin tosses that the algorithms performs. 

Note that we slightly deviate from the standard definition of an AAA algorithm, by 
allowing A to be a probabilistic algorithm with a private random string. 

Just as with the AAA streaming model, we can define AAA streaming algorithms by 
allowing a free access to the proof. Again, by definition the model of AAA streaming with a 
free access to the proof is stronger than the model of AAA streaming with a proof stream. Our 
canonical AAA algorithm works for the weaker model, wherein the proof is being streamed, 
thus our AAA upper bounds also implies AAA upper bounds. 

Note 1: In both of the models (AAA and AAA), as traditionally done in Arthur-Merlin 
probabilistic proof systems, we will sometimes describe the AAA/ AAA algorithm as an in- 
teraction between an omniscient prover Merlin, who sends an alleged proof of a statement 



A r (a,u) 



fn 



- 1 



o- 



< e 



> 1 



6 
2' 
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to Arthur, a computationally limited verifier (in our case, a streaming algorithm), who in 
turn probabilistically verifies the correctness of Merlin's proof. 

Note 2: In all of our (MA and AM) algorithms, we assume without loss of generality 
that Arthur knows both the length m and the alphabet size n. This can be done since we 
can insert m, n at the beginning of the proof. Then, Arthur only needs to verify that the 
length of the stream was indeed m, and that no element was bigger than n. Since all of 
the algorithms we present in this paper are f2(logm + logn) in both proof size and space 
complexity, this does not change their overall asymptotical complexity. 

4 The Canonical AM Streaming Algorithm 

In this section we show our canonical AM algorithm. Recall that given a data stream 
a = (ai,...,a m ) (over alphabet [n]), the element indicator Xi '■ [ n ] ~~ ^ {0? 1} of the z'th 
element (i £ [m]) of the stream a, is the function that indicates whether a given element is 
in position i £ [m] of a, i.e., Xi(j) = 1 if an d only if Oj = j. Furthermore, let x '■ [ n ] ~ * {0, l} m 
be the element indicator of a, defined by 

X(j) = {Xl(j),---,Xm(j))- 

In addition, given n £ N we define a clause over n variables function C : 

{0, 1}™ — > {0, 1} of the form (yi Vy2 V . . . V y n ), where for every i £ [n] the literal jji is either 
a variable (xj), a negation of a variable or one of the constants {0, 1}. 

We prove the following theorem: 

Theorem 4.1. Let < e < 1/2. Let V be a data stream problem such that for every 
m, n £ N there exists a set of k = k(m,n) clauses {C t }te[k] over m variables, and a function 
: {0, l} fc — > 7L, such that for every data stream a = (a±, . . . , a m ) with alphabet [n], 

n 

(l-e)V(o-)<J2^{Ci°xti),---,C k ox(j)) < (l + e)V(a). 
i=i 

Moreover, we assume that if) and {Ct}t£[k] are known to the verifier, and that there exists B < 
poly(m, n) such that ip(x) < B for every x £ {0, l} k . Then, for every < 5 < 1 and every 
s, w £ N such that s ■ w > n, there exists an explicit AM t ^(S, W) -streaming algorithm for 
approximating V(a); where S = 0[sk ■ polylog(m, n, , W = 0{wk ■ polylog(m, n, S' 1 )) , 
and the randomness complexity is polylog(m, n, 

Proof. Let < e < 1/2. Let V be a data stream problem such that for every m, n £ N there 
exists a set of k = k(m, n) clauses {C t }te[k] over m variables, and a function ip : {0, l} k — > Z, 
such that for every data stream a = (ax, ... , a m ) with alphabet [n], 

n 

(l-e)V(cr)<J2H C ^x(j),---,C k ox(j)) < (l + e)V(a). (4.1) 

3=1 
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Assume that ip and {C t }te[k] are known to the verifier, and that there exists B < poly(m, n) 
such that ip(x) < B for every x G {0, l} fc . Observe that since ip gets {0, 1} values as inputs, 
we can think of ip as a multilinear polynomial. Assume without loss of generality that k < m 
(otherwise the theorem follows trivially). 

Let < 5 < 1 and let s, w G N such that s-w > n (assume for simplicity and without loss 
of generality that s ■ w = n exactly). We show that there exists an explicit AM. e ,s{S, W)- 
streaming algorithm for approximating V(a); where 

S = O^sk ■ polylog(m, n, S^ 1 )) , 
W = 0{wk ■ polylog(m, n, 5~ x )) , 

and the randomness complexity is polylog(m, n, S^ 1 ). 

Let a = (di, . . . , a m ) be a data stream with alphabet [n]. The first step is representing 
the middle term of (14. ip as a summation of a low degree polynomial over some domain. 
Specifically, we represent the element indicators {Xi}ie[m] as bivariate polynomials over a 
finite field. 

Let p be a sufficiently large (to be determined later) prime number of order 

- ■ poly(m,n) 

such that: p > 2nB > V(a). Let V s (¥ p ) be any efficiently enumerable subset, of cardinality 
s, of the field ¥ p (e.g., the lexicographically first elements in some representation of the 
field F p ). Likewise, let V w (¥ p ) be any efficiently enumerable subset, of cardinality w, of 
the field F p . Note that since n = w ■ s, there exists a one-to-one mapping between the 
domain [n] and the domain T> w {¥ p ) x T> s (¥ p ). Fix such (efficiently computable) mapping 
7r : [n] — > T> w {¥ p ) x T> s (¥ p ) (e.g., according to the lexicographic order). 

For every i G [m] we can view x% '■ [ n ] ~^ {0, 1} as a bivariate polynomial Xi '■ ^ p ~^ 
of degree w — 1 in the first variable (which we denote by x), and degree s — 1 in the second 
variable (which we denote by y), such that for every j G [n] we have Xi 07r U) = Xi(j)- ^ we 
denote (c^, fa) := vr(aj), then the extension %j : F^ — )■ F p is given explicitly by the Lagrange 
interpolation polynomial: 



n n (y- h ) 



a&V w (¥ p ) beV 3 (¥p) 

U*, y) = -£r w ^ 

a£V w (¥ p ) beV s (¥ p ) 

Note that for every £ G T> s (¥ p ), the degree of the univariate polynomial Xi(':0 '■ ^p ~ > ~^p i s 
at most w — 1. 

Let x : F^ — )• ¥ p n be the polynomial extension of the element indicator of a, defined by 

X(x,y) = {xi(x,y), . . . ,x m (x,y)). 
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Plugging-in the polynomial extensions of the element indicators to ( 14. ip yields that 

E E ^(Cio X (x,y),...,C k ox(x,y)) (4.3) 

(where the summation is over Z) approximates V(o~) within a multiplicative factor of 1 ± e. 
Later, we will give analogous expressions of V(a) (mod q) for prime numbers q = O(logp). 

Next, we replace each clause in (14. 3[) with a low degree polynomial (over a small finite 
field) that approximates it. Towards this end, we show the following lemma (originated in 
|Raz87llS™87j ): 

Lemma 4.2. Let 5' > 0, let q be a prime number, and let {C t }te[k] be a set of k clauses 
over m variables. Using poly log (m, k, 5 /_1 ) random coin flips, we can construct a set of 
polynomials {p t : F™ — > F q ]- te ^ of degree 0(glog fc /<5') each, such that for every x G {0, l} m , 

Pr [Vt G [k] p t (x) = C t (x)} >l-5' 

(where the probability is taken over the random coin flips performed during the construction 
of {Pt}te[k])- 

Proof. Consider C := {Ct}te[k], where for every t G [k], Ct is a clause over m variables. We 
approximate each Ct G C by a polynomial pt : — > ¥ q . Recall that every clause in C is an 
m-variate disjunction gate that operates on literals, which are either a variable, or a negation 
of a variable, or one of the constants {0, 1}. 

In order to construct a polynomial approximation of a clause C t G C, we first replace each 
negation gate over a variable x in C t , with the polynomial 1 — x. Note that this polynomial 
computes the negation exactly (i.e., no approximation). 

Next, we use the method of |Raz87[ ISmo87] to approximate the m-variate disjunction 
gate of C(, by constructing an approximation polynomial in the following way: let ECC : 
F™ — > Fg 00m be a linear error correcting code with relative distance 1/3. Fix 

L = (bg k + log^ , 

such that 

(2\ L 5' 

and choose independently and uniformly at random Lx,...,Ll G [100m]. We build a low 
degree polynomial approximation for the Boolean disjunction function. Consider rj : F™ — > 
F q , defined by 

L 

r)(zi, . . .,z m ) = 1 - Yl (l - (ECC(zi, . . . ,z m ) bl ) q ^. 
1=1 
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Since ECC is linear, 77 is a polynomial of degree OiL ■ q) in the variables z\, . . . , z m . Observe 
that the linearity and the relative distance of ECC, together with Fermat's little theorem 
implies that for every (xi, . . . , x m ) G {0, l} m , 



Pr 



r/Oi, . . . ,x m ) ^ \J Xi 



< s - 

~ k 



(4.4) 



(where the probability is taken over the random choices of L\, . . . ,Ll G [100m]). Note that 
we use the same polynomial r\ for all of the clauses in C. Thus, the total number of coin 
flips that we use is polylog(m, k, £' _1 ). The last step of the construction is defining p t as the 
composition of the disjunction polynomial rj and the literals in the clause Ct- 

Note that applying the approximation procedure that we described above to all of the 
clauses in C, results with a set of k polynomials {p t : F™ — > ¥ q } te ^], where for every t G [k] 
the degree of p t is 0(q\og k /s'). We conclude the proof of the lemma by noticing that (14 Ah 
together with a union bound imply that for every x G {0, l} m , 



Pr [Vt G [k] p t {x) = C t {x)\ >1-S' 
(where the probability is over the random choices of Li, . . . , Ll G [100m]). 



□ 



Observe that by applying Lemma 14.21 with 5' = 5 and p as the prime number, we can 
represent (14.31) as a summation over a polynomial. However, the degree of this polynomial 
(which is dominated by p), is too high for our needs. Instead, we approximate (14. 3p by 
O(logp) low degree polynomials. 

We start by introducing the necessary notations. Let Q = {<?i, • • • , (JVciogp)} (where 
p : N — > N is the prime counting function) be the set of all prime numbers that are smaller 
or equal to clogp, where c is a constant such that 

Y[q > P- 

46 Q 



For every q G Q denote M q := F x,, where X q is the minimum integer that satisfies 
q Xq > p. Since q = O(logp), and by the minimality of X q , we have |HLJ < pq = 0(p\ogp). 
Furthermore, 

V(a) (mod q) = ^ ^2 ^( C ^ x(x,y), . . . ,C k o x(x,y)) (mod q) (4.5) 

x£T> w (V p ) y£T> s (W p ) 

(where we can think of the summation over Z modulo q, as summation over ¥ q ). Denote 

V q (a) := V(a) (mod q). 



Analogously to the definitions for F p ; for every prime q G Q we define efficiently enumer- 
able subsets T) 8 (Mq), T> w (M. q ) of H 9 , with cardinality s,w (respectively), and a one-to-one 
mapping n q : [n] — > V w {M q ) x V s {M q ). For every i G [m], we can view Xi '■ [ n ] — > {0, 1} as 
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a bivariate polynomial xl '■ ^-q ~~ * ^-q °f degree w — 1 in the first variable (which we denote 
by x), and degree s — 1 in the second variable (which we denote by y), such that for every 
j E [n] we have xj ° Kq{j) = Xi{j)- Let x q : — > W q be defined by 

X q (x,y) = {xl(x,y), . . . ,Xm( x >V))- 

Moreover, we can think of the multilinear polynomial ip : {0, l} k — > Z as a multilinear 
polynomial ip : — > ¥ p (recall that ip(x) < B < p for every x E {0, l} fc ). Let ij) q : F^ — )■ F 9 
be the polynomial function defined by the formal polynomial (i.e., a summation of monomials 
multiplied by coefficients) i/j, where we take each coefficient of ip modulo q. Since ¥ q is a 
sub field of M q , we can also view ip q as a multilinear polynomial from M. q to M. q . 



Thus, we can express (14. 5 P as follows: 

a;625^(H,) ye25 s (H 9 ) 



(4.6) 



(where the summation is over M. q , which in this case is equal to summation over ¥ q , hence 
the modulo g)H 

For every q E Q, we apply Lemma [4.21 with 8' = 2ne j ogp ; and q as the prime number. We 
get a set of polynomials 

(for every q E Q), of degree O (glog fcral °g p ) each, such that for every x E {0, l} m , 



Pr[VtG[£;] p t (x) = C t (x)\ > 1 



5 



2nc logp 



(4.7) 



(where the probability is taken over the random coin flips performed during the construction 

of {pthm)- 

Since F g is a subfield of M. q , we can view p t : F™ — > ¥ q as a polynomial p t : — > M q (for 
every t E [k]). Then, for every x E ¥ 1 q n we have p t (x) = Pt(x). Thus, we get the following set 
of polynomials: 

{p t :W q ^M q } tm , 

where for every t E [k], the degree of pi is O (glog , 
Applying a union bound, and using (14. 7p yields: 



Pr 



^V ") = E E i J g(p^ x g (x 1 y),...,PkOx q (x 1 y)) 

x£V w (W q ) y£V a (W q ) 



> 1 



5 



2c log p 



(4.? 



2 Since for every a; G I? M ,(H 9 ) and y G P s (H g ) we have (Ci o x 9 (^, 2/), • • • , Cfc X^O^ J/)) G {0, then 
each summand is in ¥ q . Hence we can think of the summation as summation over ¥ q . 
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(where the probability is taken over the random coin flips performed during the construction 
of {pt}te[k}, and the summation is over 

Next, we define the polynomial u q : M. q — > M q by 



u q {x) 



^«(Pi ° X 9 (x,y)i ■ ■ ■ ,Pk ° X q (x,y)) 

yeT> s (W q ) 



(where the summation is over M. q ). Note that for every t G [k], the composition of p t and x 9 
is a polynomial of degree 

/ fcnlogp 
C I wglog 



5 

in x (the first variable). Hence, by the multilinearity of ip, 



deg(u q )=0[wkq\og , ^pP). (4.9) 



By P~5]) we have, 

Pr 



> 1 r — ( 4 -!0) 

2c log p 



(where the probability is taken over the random coin flips performed during the construction 
of {pt}te[k]i an( i t ne summation is over M q ). 

Once we established the above, we can finally describe Merlin's proof stream. The proof 
stream f consists of all the proof polynomials {u q } q£ Q. We send each polynomial by its list 
of coefficients, thus we need at most 

0^|Q|wA;log(p)log -log(plogp) 

bits in order to write down the proof stream. Since \Q\ < clogp, we conclude: 
Claim 4.3. the total size of Merlin's proof stream if is 

O (wk ■ polylog (m, n, (P 1 ) 

Observe that it is possible to reconstruct P(cr) from the polynomials given in Merlin's 
proof. We formalize this claim as follows: 

Claim 4.4. Given the set of values {^2 xeV ( H ) u q (x)} q( zQ ; it is possible to compute P(cr) with 
probability 1 — 6/2 (over the random coin tosses that were performed during the construction 

3 Again, since for every x € T> w (M q ) and y € T> s (M q ) we have (pi o x 9 (x, y), . . . ,Pk o x q {x, y)) € {0, l} fe , 
then each summand is in ¥ q . Hence, the summation is modulo q. 
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Proof. Note that for every q G Q we have 



Pr 



u q {x) = V(cr) (mod q) 

xev w (M q ) 



> 1 



8 



2c log p 



Hence, 



Pr 



\/q E Q ^ = Via) (mod g) 



> 1 



By the Chinese remainder theorem, given {V(a) (mod q)} q( zQ we can calculate 

Via) 



(mod ] T q). 



Since we've chosen Q such that flgeQ Q > P> the claim follows 



□ 



Another important property of the polynomials {u q } q£ Q in the proof stream, is that given 
a sequential, one-pass access to the input stream, it is possible to efficiently evaluate each 
polynomial at a specific point. Formally, we show: 

Lemma 4.5. For every q G Q, there exists a streaming algorithm A q with an access to the 
common random string r, such that given a point in the finite field £ G M q , and a sequential, 
one-pass access to the input stream a, the streaming algorithm A q can evaluate oo q (l;) using 
0(sk ■ polylog(m, n, bits of space. 

Proof. First, recall that the descriptions of {C t }t£[k] and ip are known to the verifier. Note 
that in order to compute u q (C,) it is sufficient to compute and store the values of 



{pt{m,y),---:X 9 m (Z,y))} 



t&[k],y&V a (M q ) ' 



where {pt}te[k} are the approximation polynomials of the clauses {C t }t£[k} over M q . Given 
these values we can compute 

[Pi (%i(6> y), ■ ■ ■ , x q m (£, y)) , ■ ■ ■ , Pk , y), ■ ■ ■ , Xm(£> v)) 



monomial-by- monomial according to the description of i/j, and then compute 0J q {£) by sum- 
ming term-by-term. 

Before we describe the algorithm, recall that during the construction of {pt}t£[k] we 
defined an error correcting code ECC : F™ — > Fj 00 ™ with relative distance 1/3. Note that 
since ECC is a linear function, we can extend it (via the linear extension) to M q . We fixed 



L = O log k + Ioe 



O ( log k + log 



n\ogp 
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and chose independently and uniformly t±, . . . ,Il £ [100m], using the common random string 
r. Finally we approximated each of the V gates by the following polynomial, 

L 

V {z 1} . . . , z m ) = 1 - JJ (l - (ECC(zi, . . . , z^^' 1 ) . (4.11) 
1=1 

Note that in order to compute 

Pt(xl(t,y), ■ ■ ■ ,Xm(£,y)) 
for all t £ [k] and y £ T> s (M. q ), it is sufficient to compute 

Ecc(4(e,2/),...,4(e,?/)), 

(where for every i £ [m] and £ £ [A;] the value y) is either xf(£, y), or 1 — y), or 
one of the constants {0, 1}; depending on the clause Ct), for all i\ £ {ti, . . . , Ll}, t £ [fc], and 
y &D s (Mq). Then we can compute Pt\Xi{£, y), XmiCi V) ) according to ( 14. lip . 

Since ECC is a linear error correcting code, we can compute each 

ECC(4(e,y),...,4(e,2/)), 

incrementally. That is, we read the data stream a element-by-element. At each step, when 
the i'th element arrives (i £ [m]), for every y £ T> s (M. q ) we compute Xii^y) according to 
(14. 2p . and then ^(^, y) according to the description of Ct- By the linearity of ECC we can 
compute ECC(^* (£, y), . . . , t m {^, y)) H by incrementally adding each 

ECC(0,...,0,^(e,?/),0,...,0) tl 

at the i'th step. 

Observe that during the run over a, the entire computation is performed element-by- 
element, and that we used at most O (\T> s (M. q )\ ■ k ■ L ■ logp) bits of space. Thus the overall 
space complexity is 

O (sk ■ polylog (m, n, 5' 1 ) J . 

□ 

The last lemma helps us to show that with high probability Merlin cannot cheat Arthur 
by using maliciously chosen proof polynomials. We show that by evaluating the actual 
proof polynomials at a randomly chosen point, Arthur can detect a false proof with high 
probability. Formally: 

Lemma 4.6. For every q £ Q, given a polynomial Cj q : H g — > M. q of degree at most 
O ( w kq\og^^)B ifu q ^ oo q then: 

where the probability is taken over uniformly choosing at random an element £ £ M. q . 
4 More precisely, the degree is exactly as in 14.91 
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Proof. Let £ be an element uniformly chosen from M q . By the Schwartz-Zippel Lemma, we 
have 

prtw = «,({)] < ^iM^m < |, 

where in order to get the last inequality we fix p to be a sufficiently large prime number, of 
order 

- • poly (m,n). 

□ 

Finally, building upon the aforementioned lemmas, we can present the AM. algorithm 
for the approximation of V(a): 
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The prover (Merlin): 

1. Choose ti, . . . , ll £ [100m] using the common random string r. 

2. Construct ip that consists of all the proof polynomials {uo q } q ^Q. 

3. Send (via streaming) ip = {co q } q ^Q to the verifier. 
The verifier (Arthur): 

1. For every q € Q, select uniformly at random ^ q € M q (where the selection uses Arthur's 
private random coin tosses). 

2. Read Merlin's proof stream ip = {u} q } q ^Q and (incrementally) compute: 



Last, we show that the aforementioned algorithm is an AM. e j{S, VT)-streaming algorithm 
for V(cr), where 

• S = 0(sk ■ polylog(m, n, 5 -1 )) , 

• W = 0(wk ■ polylog(m, n, S^ 1 )) . 

Indeed, given e > 0, 5 > 0, a common random string r, and a data stream problem V, 
our algorithm is a probabilistic data stream algorithm (denote it by A), which has an oracle 
access to r. The algorithm simultaneously gets two streams: an input stream a and a proof 
stream tp, to both it has a sequential, one pass access. According to Claim |4~51 



As for the space complexity of A, note that A stores O(logp) random values {£,j} 9 sq 
of size O(logp) each, which takes polylog(m, n, 5 -1 )) bits of space. In addition it uses 
polylog(m, n, 5 -1 )) bits of space for computing 



(a) {u) q (£ q )} qeQ . 




3. Run {A q } q £Q in parallel, in order to compute {wq(£q)}qeQ- 

4. If there exists q € Q for which uj q (£ q ) ^ u q (£ q ), return _L. 

5. Otherwise, use < Ylx&v (H ) <^q( x ) f to extract and return V(a). 



Figure 1: The Canonical AM streaming algorithm 
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Observe that these values can be computed incrementally using a sequential, one-pass ac- 
cess to if, simply by evaluating the polynomials monomial-by-monomial. According to 
Lemma [4. 5 1 each of the O(logp) algorithms {A q } qe Q we run in parallel takes 

O (sk ■ polylog(m, n, S^ 1 )) 

bits of space. Thus the total space complexity is S = O^sk • polylog(m, n, <5 -1 )). 

Recall that the only time that the algorithm used the common random string r, is while 
building the approximation polynomial for the disjunction in each {u q } q( zQ. Since we con- 
structed \Q\ such polynomials, and by Lemma H~2"t the total number of random bits we read 
from r is polylog(m, n, Furthermore, A also uses only polylog(m, n, private ran- 

dom coin tosses, as the only randomness it needs is for the selection of random £ q £ M. q for 
every q £ Q. Thus, the total randomness complexity of the algorithm is polylog(m, n, S^ 1 ). 

We finish the proof by showing the correctness of the algorithm: 



1. Completeness: Assuming Merlin is honest, i.e., u q = u q for every q £ Q; then by 
Claim fl~4l we can calculate V(a) with probability 1 — 5/2 over the common random 
string r, and by (14. 3 j) we have 



;i - e)P(a) < V{a) < (1 + e)V{a). 



Hence: 



Pr 



< € 



> 1 



2. Soundness: If Merlin is dishonest, i.e., there exists q £ Q for which u q ^ u q , then by 
Lemma I4.6[ 

Pr[A(a,tp)^±]< S -, 
where the probability is taken over the private random coin tosses that A performs. 



□ 



5 The MA Communication Complexity of Gap Hamming Distance 

In this section we show that every M.A communication complexity protocol for the Gap 
Hamming Distance problem (GHD) that communicates T bits and uses a proof of length W, 
must satisfy T ■ W = Q(n), and therefore T + W = Q,{\fn). 

In Section [61 we will use the lower bound on the AAA communication complexity of GHD 
to show a lower bound on the AAA streaming complexity of the Distinct Elements problem. 
We note that the lower bound on the AAA communication complexity of GHD also implies 
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a lower bound on the AAA streaming complexity of computing the empirical entropy of a 
data stream (see |CBM06| for a formal definition of the Empirical Entropy problem). 



For completeness, we show an M.A communication complexity protocol for GHD that 
communicates 0(T log n) bits and uses a proof of length 0(W log n), for every T ■ W > n. 
Thus we have a tight bound (up to logarithmic factors) of T ■ W = fl(n). 

5.1 Lower bound 

In order to prove our lower bound on the AiA communication complexity of Gap Hamming 
Distance, we first show a lower bound on the M.A communication complexity of Gap Or- 
thogonality, a problem wherein each party gets a vector in { — 1, l} n and needs to tell whether 
the vectors are nearly orthogonal, or far from being orthogonal. We then apply the reduc- 
tion from the Gap Orthogonality problem to the Gap Hamming Distance problem (following 
[Shell]), and obtain our lower bound. 

Formally, the Gap Orthogonality problem is defined as follows: 

Definition 5.1. Let n be an integer, and let Co ; Ci > 0- The Gap Orthogonality problem is 
the communication complexity problem of computing the partial Boolean function ORT n ^ ^ : 
{-1, l} n x {-1, l} n {0, 1} given by 



ORT n , Co , Cl (x,?/) 

Denote ORT = ORT 



1 if \(x,y)\<Ci 
if \(x,y)\>( 



We restate the following theorem from |Shell| . which given two finite sets X, Y, guar 



anties that if the inner product of a random vector from X and a random vector from Y is 
highly concentrated around 0, then X x Y must be a small rectangle. 

Theorem 5.2. Let 5 > be a sufficiently small constant, and let X, Y C { — 1, l} n be two 
sets, such that 



Pr 



(x,y)\ > ±1 



< 5 



(where the probability is taken over selecting independently and uniformly at random x G X 
and y G Y ), then 

A~ n \X\ \Y\ = e~ Q ( n \ 

Denote the uniform distribution on { — 1, 1}" x { — 1, l} n by \i. We get the next immediate 
corollary of Theorem I5.'4 

Corollary 5.3. There exists a (sufficiently small) constant 5 > such that for every rect- 
angle R C {-1, 1}™ x {-1, l} n with n(R) > 2~ 5n we have 

fi (i2nORT _1 (0)) > 6fi(R). 
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Proof. Assume by contradiction that there exists a rectangle R := X x Y C { — 1, l} n x 
{-1,1}" with n(R) > 2~ 5n that satisfies 



Observe that 



Pr 



H (flnORT _1 (0)) < 6fi{R). 

yu (/? n ORT _1 (0)) 



(5.1) 



\(x,y)\ > 



fi(R) 



(where the probability is taken over selecting independently and uniformly at random i6l 
and y £ V). Hence we can write (15. ip as 



Note that 



Pr 



\(x,y)\ > 



\X\ \Y\ 

~2n ' ~2™~ 



< 6. 



4^ n |X||F| 



(5.2) 



If we choose S to be sufficiently small, then (15. 2ft guaranties the precondition of Theorem 15.21 
and we get that fi(R) = e~ n(jl \ in contradiction to the assumption that fi(R) > 2~ Sn . □ 



In particular, Corollary 15.31 implies that every rectangle R C { — 1, l} n x { — 1, l} n satisfies 

H (Rn ORT _1 (0)) > 5/i(R) - 2~ 5n . (5.3) 

Next, using well known techniques (cf. |RS04| ). we show a lower bound on the M.A 
communication complexity of ORT, relying on Corollary 15.31 Formally, we prove 

Theorem 5.4. Let e be a positive constant such that e < |. For every AiA e (T, W) commu- 
nication complexity protocol for ORT we have T -W — Q(n), hence T + W = Q(y/n). 

Proof. Fix n. Denote TZ = {-1, 1}" x {-1, 1}". Assume that there exists an MA t (T, W) 
communication complexity protocol for ORT; denote it by V . By a simple amplification 
argument we get that there exists an AiA e >(k, W) communication complexity protocol for 
ORT, where k = 0(T ■ W) and e' = 2~ cw (for an arbitrary large constant C); denote it by 

V. 

Assume by contradiction that k = o(n). We will show that our assumption that k is 
asymptotically smaller than n implies that the error probability of V' is greater than 2~ cw , 
in contradiction. 

Denote Merlin's proof, a binary string of size at most W bits, by w. Denote the random 
string that V' uses by s. Denote by R s , w ,h Q TZ the set of all input pairs (x, y) £ TZ such 
that the history of (x, y, s, w) is We state the following Lemma from |RS04j : 

5 For any input pair (x, y) £ TZ and any assignment s to the random string of V and any assignment w 
to the proof supplied to the players, the string of communication bits exchanged by the two players on the 
inputs (x, y), using the random string s and the proof if, is called the history of (x, y, s, w). 
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Lemma 5.5. For every s,w,h we have R SiW ,h = X s w ^ x Y SjV> ^ (where X SyW ^ C { — l,l} n 
and Y SjWj h C {—1, l} n ), and for every s,w the family {R s ,w,h}he{o,i} k ^ s a partition of 71. 

Denote the answer that V gives on (x, y, s, w) by V'(x, y, s, w). Since the answer of V on 
inputs in R s>W} h does not depend on x and y, then for every input pair in R s , w ,h the answer 
V'(x,y, s,w) is the same; denote it by V'(s,w, h). Next, define Hq C TZ to be the set of all 
input pairs (x, y) G TZ such that 



\(x,y)\ > 

and define Hi C 7Z to be the set of all input pairs (x, y) G 1Z such that 



ii 



Note that if we choose x = (x\, . . . , cc n ) G { — 1, l} n and y = . . . , y n ) G {—1, 1}" indepen- 
dently and uniformly at random, then for every % G [n] the product Xi ■ yi is also uniformly 



distributed. Thus, if we choose z 



G { — 1,1}™ uniformly at random, then 



Pr 

(x,y)&n 



\(x,y)\ < 



Pr 

ze{-i,i} r 



i=i 



< 



> c, 



(5.4) 



for some universal constant c. 



Next, for every rectangle R C 7Z, denote by a(R) the measure of R in 7Z. Denote by 
0o(R) the measure of R (1 Ho in Hq, and denote by 0i(R) the measure of R D in ifi. 
Under these notations, we see that (15. 3p implies that there exists a universal constant 6 > 
such that for any rectangle R OTZ we have 



A)(#) > 5 • a ( R ) - 2 



-Sn 



According to Equation 15. 4[ we know that Hi is a set of probability at least c in TZ. Hence for 
every rectangle R C TZ we have (3i(R) < 1 /c-a(R). Therefore we have the following corollary, 

Corollary 5.6. There exist universal constants 5,5' > such that every rectangle R C TZ 
satisfies 



P (R)>5'-(3i(R)-2 



-Sn 



For any s, w, denote by Aq(s, w) C TZ the union of all sets R s , w ,h such that V'(s, w, h) = 0, 
and denote by Ai(s, w) C TZ the union of all sets R s ^ Wt h such that V'(s, w, h) = 1. Observe 
that v4 (s, w ) and -4i(s, if) are disjoint, and that A (s, if) U Ai(s, it?) = TZ. 

Since each of y4o(s,u>) and Ai(s,w) is a union of at most 2 k of the sets X S7 w,h x ^s,tu,ft.; 
we see that Corollary 15.61 implies 

Po(Ai(s, w)) > 5' ■ Pi(Ai(s, w)) - 2 k ■ 2~ Sn > 5' ■ Pi(Ai{s, w)) - o(2~ w ). (5.5) 
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Recall that /3i(Ai(s,w)) is the fraction of inputs (x,y) in Hi such that V'(x,y,s,w) = 1, 
and that Hi is the set of ones of the problem. Thus for every input (x, y) in Hi there exists 
w such that (x,y) G Ai(s,w) with probability of at least (1 — e') over s. Since the number 
of possible proofs w is at most 2 W , by an averaging argument we get that there exists a 
proof that corresponds to at least 2~ w fraction of the inputs in H\. Formally speaking, 
there exists at least one binary string w of size at most W, and a set H[ C Hi that satisfies 
0i(H[) > 2~ w , such that that for every (x,y) G H[, 

Pr[(x,y) G Ai(s,w)] > 1 - e'. 

Therefore, there exists a constant Co, such that with constant probability (over the random 
string s), 

f3i(Ai(s,w))>2-( w+c °\ 

Hence, by (15. 5p . with constant probability (over the random string s), 

Po(Ai(s,w)) > 5' ■ 2~ W - C0 - o(2~ w ) > ci5' ■ 2~ w . 

for some constant c\. However, recall that /3o(Ai(s, w)) is the fraction of inputs (x,y) in Ho 
for which V'(x,y, s,w) returns 1. Thus there exists a constant C2 such that, 

Pr [P'(x, y, s, w) = l]> c 2 5' ■ 2' w 

(where the probability is taken over both the random string s, and the uniform selection of 
(x, y) G Hq). But Ho is the set of zeros of the problem, so for every (x, y) G Hq the protocol 
answers 1 with probability at most e' < 2~ cw (for an arbitrary large constant C), which is 
a contradiction. □ 

We established that for every M.A e (T, W) communication complexity protocol for ORT 
we have T ■ W = Q(n). According to the duplication argument in |Shell| . Theorem 15.41 
implies the following corollary for slightly different parameters of the orthogonality problem. 

Corollary 5.7. Let e be a positive constant such that e < |. For every M.A e {T,W) 
communication complexity protocol for OKY n 2y fn^{x,y) we have T ■ W = Q(n), hence 
T + W = fl(Vn). 

Next, we state the following reduction from [Shell] (repharsed): 

Lemma 5.8. Let n G N be a perfect square. For every input x G {—1,1}" denote by x m 
(m G N) the string of length n ■ m that is composed of x concatenated to itself m — 1 times. 
Then, for every (x,y) G ORT^ ^O) U ORT^^l) we have 

ORT ni2v ^(x,y) = -GHD 10n+15 ^^(x 10 (-l) 15 ^,y 10 (+l) 15 ^) 

A GHD 10n+15v ^ iv ^ iv ^ (x 10 (+l) 15 ^,y 10 (+l) 15 ^) . 
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Note that due to the symmetry of the gap Hamming distance problem, a protocol 
for GHD 10n+15v ^ 

,yjn,y/n implies a protocol for — , GHD 10n _|_i5 v ^ >v ^; jV ^. Hence, if we assume 
by contradiction that there exists an AdA e (T, W) communication complexity protocol for 
GHD 10n+15v /^ iV /^ ) where < e < \ and T ■ W = o(n) (which in turn implies that there 
exists an MA e (T, W) communication complexity protocol for _| GHD 10n+15v ^ jV ^ v ^;, where 
< e < | and T ■ W = o(n)), then by applying Lemma [5.81 we get an M.A2 e (T, W) commu- 
nication complexity protocol for ORT n 2v ^ iV ^:(x, y) such that T ■ W = o(n), in contradiction 
to Corollary 15.71 Thus we get the following corollary, 

Corollary 5.9. Let e be a positive constant, such that e < |. For every A4A e (T,W) 
communication complexity protocol for GHD 10n+15v ^ jV ^ jV ^ we have T ■ W = Q(n), hence 
T + W = Q(y/n). 



Finally, we note that in previous work |CR11| provided a toolkit of simple reductions 
that can be used to generalize a lower bound on the communication complexity of gap 
Hamming distance for every reasonable parameter settings. Specifically, a lower bound for 
GHD 10n+15 implies a lower bound for GHD = GHD nv ^ Moreover, we note that 
their reduction is directly robust for M.A communication complexity; thus we conclude, 

Theorem 5.10. Let e be a positive constant, such that e < \. For every A4A e (T, W) 
communication complexity protocol for GHD we have T - W — Q(n), hence T + W = Q(y/n). 



5.2 Upper bound 



In their seminal paper, Aaronson and Widgerson |AW09] showed an AiA communication 



complexity protocol for the disjointness problem, wherein the communication complexity is 
0(\/nlogn), and the size of the proof is also 0(y/n logn). 

We modify their protocol in order to show an A4A communication complexity protocol 
for GHD, wherein the communication complexity is O(Tlogn), and the size of the proof is 
0(W\ogn), for every T ■ W > n. 

Theorem 5.11. Let T,W G N such that T ■ W > n. Then, there exists an explicit 
M.Ai/z(T\ogn, Wlogn) communication complexity protocol for GHD. 

Proof. Let T, W G N such that T ■ W > n. Assume for simplicity and without loss of 
generality that T ■ W = n exactly. Let a := (ai, . . . , a n ) G { — 1, l} n be the input of Alice, 
and b := (pi, . . . ,b n ) G { — 1, l} n be the input of Bob. Let each player define a bivariate 
function that represents its input; more precisely, let Alice define f a : [W] x [T] — > { — 1, 1} 
by 

fa(x,y) = a (a; _i) T+2/ , 
and similarly, let Bob define f b : [W] x [T] — > { — 1,1} by 

f b (x,y) = b (x _ 1)T+y . 
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Fix a prime q G [6n, 12n]. Note that / a and have unique extensions f a : — > F 9 and 

/6 : — >■ Fg (respectively) as polynomials of degree (W — 1) in the first variable, and degree 
(T — 1) in the second variable. Next, define the polynomial s : ¥ q — > ¥ q by 

Note that the degree of s is at most 2(W — 1). Denote the Hamming distance of a and by 
HD(a,6). Then, 

urv u\ n ~^x€[W] s ( x ) (v . „s 

HD(a,6) = . (5.6) 

Thus, it is sufficient for one of the players to know s in order to compute the Hamming 
distance. We define the following AiA communication complexity protocol: 



1. Merlin sends Alice a message that consists of the coefficients of a polynomial s' : ¥ q — > ¥ q of 
degree at most 2(W — f), for which Merlin claims that s' = s. 

2. Bob uniformly picks r £¥ q , and sends Alice a message that consists of r and 

f b (r,l),...Jb(r,T). 

3. Alice computes s(r) = X^j/e[T] /a( r > y)fb( r > y) an d s'(r). If s(r) = s'(r), Alice computes 
HD(a, b) = — — __) anc j re ^ urns resu lt. Otherwise, Alice rejects the proof and 
returns _L. 



Figure 2: A-^^l Communication Complexity Protocol for GHD 



Note that Merlin sends the coefficients of a polynomial of degree at most 2(W — 1) over 
a finite field of cardinality 0(n). Hence the size of the proof is 0{W\ogn). In addition, 
note that the entire communication between Alice and Bob consists of sending the element 
r and the T evaluations of /& (in step 2 of the algorithm). Hence the total communication 
complexity is O(Tlogn). 

If Merlin is honest, then Alice can directly compute HD(a, 6) with probability 1, as 
according to (15. 6 j) the Hamming distance of a and b can be inferred from s. Otherwise, if 
s' 7^ s then by the Schwartz-Zippel Lemma 

Pr[s(r) = s'{r)\ < ^ ~ ^ < i 

(where the probability is taken over the random selection of r G ¥ q ). Thus the test fails with 
probability at least 2/3. □ 
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6 The AM Streaming Complexity of Distinct Elements 

In this section we show an application of the canonical AM streaming algorithm for the 
Distinct Elements problem. In the regular data stream model (without any probabilistic 
proof system), it is well known (cf. |Mut05] ) that the space complexity of the Distinct 
Elements problem is lower bounded by the size of the alphabet of the data stream (for 
sufficiently long data streams). In contrast, using the canonical AM streaming algorithm 
we show that by allowing AM proofs, we can obtain a tradeoff between the space complexity 
and the size of the proof. 

Furthermore, we then rely on our lower bound on the MA communication complexity of 
the GHD problem, in order to show a matching lower bound on the MA streaming complexity 
of Distinct Elements. 

6.1 Upper Bound 

We show that for every s, w G N such that s ■ w > n (where n is the size of the alphabet) 
there exists an AM streaming algorithm for the Distinct Elements problem that uses a proof 
of size 0(w) and a space complexity O(s). For example, by fixing w = n, we have an AM 
streaming algorithm for the Distinct Elements problem that uses only a polylogarithmic (in 
the size of the alphabet and the length of the stream) number of bits of space. 

Formally, we show: 

Theorem 6.1. For every s, w G N such that s ■ w > n, there exists an explicit AMq,i/3(s ■ 
polylog(m, n), w ■ polylog(m, n)) streaming algorithm for the Distinct Elements problem, 
given a data stream a = (a\, . . . , a m ) with alphabet [n]. 

The idea behind the proof of Theorem 16.11 is simply noting that we can indicate whether 
an element j appears in the stream, by the disjunction of the element indicators of j G [n] in 
all of the positions of the stream (i.e., Xi(j)i ■ ■ ■ > Xm{j))- Then we can represent the number 
of distinct elements as a sum of disjunctions, and use the canonical AM streaming algorithm 
in order to solve the Distinct Elements problem. Formally, 

Proof. Recall that the Distinct Elements problem is the data stream problem of computing 
(exactly) the following function: 

F (a) = \{iE[n] : 3j G [m] a t = i}\ . 

Observe that for every data stream we can write Fq(o~) as 

n 

£(xiO') vx 2 (i) v...vxm(i))- 

3=1 

Let er = (ai,...,a m ) be a data stream with alphabet [n]. Let s,w G N such that 
s ■ w > n, let e = 0, and let 5 = 1/3. By Theorem 14.11 we have an explicit AM e ,s{S,W)- 
streaming algorithm for computing F (a), where S = 0(s ■ polylog(m, n) J and W = 0(w ■ 
polylog(m, n)) . □ 
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6.2 Lower bound 



In_the rest of this section we consider the AAA model. As we mentioned in Section [3] the 
AAA model, wherein the verifier has a free access to the proof, is stronger than the AAA 
model, wherein the proof is being streamed. Hence the lower bound we prove holds for both 
models. 



As implicitly shown in |IW03] , the communication complexity problem of GHD reduces to 



the data stream problem of Distinct Elements. We note that the foregoing reduction can be 
adapted in order to reduce the AAA communication complexity problem of GHD to the AAA 
problem of approximating the number of distinct elements in a stream within a multiplicative 
factor of 1 ± 1/ y/n. Together with our lower bound on the AAA communication complexity 
of GHD, this implies the following: 



Theorem 6.2. Let 8 < \. For every AAAj_ S (S, W) streaming algorithm for approximating 

the number of distinct elements in a data stream o = (a 1; . . . , a m ) (over alphabet [n]) we 
have S-W = fi(ra), hence S + W = n(y/n). 

Proof. Let Alice hold a string x G { — 1, l} n and Bob hold y G { — 1, l} n . Alice can convert her 
string; ) to a data stream over the alphabet S = {(i, b) \ % G [n], b G { — 1, 1}} 

in the following manner: 

a A = (2,x 2 ), • • • , (n,x n )). 

Similarly, Bob can convert y = (yi, . . . , y n ) to 

°~b = ((l,2/i), (2,2/ 2 ), • • • , (n,y n )). 

Observe that all of the elements in a a are distinct, and that all of the elements in 0% are 
also distinct. In addition, note that the only way in which an element can appear twice in 
the concatenation of the streams is if x\ = y$ for some i G [n]. In fact, if we denote the 
number of distinct elements in a a o~b by d, and denote the Hamming distance of x and y 
by HD(x,y), then we have the following relation: 

d = n + HD(x,y). (6.1) 



Alice and Bob can simulate running an A4A streaming algorithm on the concatenation 
of their inputs by using a one-way A4A communication complexity protocol, such that 
the number of the bits that are being communicated during the execution of the protocol 
is exactly the same as the number of bits of space that are used by the simulated A4A 
streaming algorithm. Details follow. 

Say we have an Ai A j_ S (S,W) streaming algorithm A for approximating the number 

of distinct elements in a a o\b- Alice can run A on a^, using a proof w of size W . After 
the algorithm finished processing the last element of a a, Alice sends the current state of her 
memory (which consists of at most S bits) to Bob. Next, Bob sets his memory to the state 
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that Alice had sent, uses the proof w, and completes the run of A over erg- Note that the 
total communication during the execution of the aforementioned protocol is at most S bits, 
as the data stream algorithm uses at most S bits of space during its execution. 

As a conclusion, if there exists such A then by the reduction above there exists an AAA 
communication protocol that outputs a 1 ± l/\/n multiplicative approximation of d. By 
( 16. ip we can compute HD(x,y), such that 

HD{x,y) -y^- HD( ^ y) < HD(x,y) < HD(x,y) + v^ 1 HD(x,Z/) 



or 

HD(x, y) - 2^ < HD(x, y) < HD(x, y) + 

Thus we can solve GHD n i 2 v /n,2 v / n while communicating at most 0(5*) bits and using a proof 
of at most 0(W) bits. Hence, using the toolkit of reductions provided in |CR11] (see Sec- 
tional]) we can solve GHD, while communicating at most 0(S) bits and using a proof of at 
most 0(W) bits. Thus, by Theorem 15. 101 we have S ■ W = Q(n), hence S + W = Q(y/n). □ 

Note that in particular, Theorem 16.21 implies a lower bound (with the same parameters) 
on the MA streaming complexity of computing the exact number of distinct elements in a 
stream. 

Last, we also note that by a straightforward adaptation of the reduction from the com- 
munication complexity problem of GHD to the data stream problem of Empirical Entropy 
(see |CCM07] ). our A4A lower bound on GHD also implies an AiA lower bound on the 
Empirical Entropy problem. 
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